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Abstract 

In standard clinical within- subject analyses of event-related fMRI data, two steps are usually per- 
formed separately: detection of brain activity and estimation of the hemodynamic response. Because these 
two steps are inherently linked, we adopt the so-called region-based Joint Detection-Estimation (JDE) 
framework that addresses this joint issue using a multivariate inference for detection and estimation. JDE 
is built by making use of a regional bilinear generative model of the BOLD response and constraining 
the parameter estimation by physiological priors using temporal and spatial information in a Markovian 
modeling. In contrast to previous works that use Markov Chain Monte Carlo (MCMC) techniques 
to approximate the resulting intractable posterior distribution, we recast the JDE into a missing data 
framework and derive a Variational Expectation-Maximization (VEM) algorithm for its inference. A 
variational approximation is used to approximate the Markovian model in the unsupervised spatially 
adaptive JDE inference, which allows fine automatic tuning of spatial regularisation parameters. It follows 
a new algorithm that exhibits interesting properties compared to the previously used MCMC-based 
approach. Experiments on artificial and real data show that VEM-JDE is robust to model mis-specification 
and provides computational gain while maintaining good performance in terms of activation detection 
and hemodynamic shape recovery. 
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I. Introduction 

Functional Magnetic Resonance Imaging (fMRI) is a powerful tool to non-invasively study the relation- 
ship between a sensory or cognitive task and the ensuing evoked neural activity through the neurovascular 
coupling measured by the BOLD signal [1]. Since the 90's, this modality has become widely used in 
neuroimaging. Functional connectivity analyses aim at studying the interactions between signals and 
thus provide insight on integrative cerebral phenomena. In a complementary manner, we focus on the 
recovery of localization and dynamics of local evoked activity, thus on specialized cerebral processes. In 
this setting, the key issue is the modeling of the link between stimulation events and the induced BOLD 
effect throughout the brain. Physiological non-linear models [2, 3] are the most specific approaches to 
properly describe this link but their computational cost and their identifiability issues limit their use to a 
limited number of specific regions and to a few experimental conditions. The common approach, being 
the context of this paper, rather consists of linear systems which appear more robust and tractable. Here, 
the link between stimulation and BOLD effect is modelled through a convolutive system where each 
stimulus event induces a BOLD response, via the convolution of the binary stimulus sequence with the 
Hemodynamic Response Function (HRF). It follows two tasks for such BOLD analysis: the detection 
of where cerebral activity occurs and the estimation of its dynamics through the HRF identification. 
Commonly, the estimation part is ignored and the HRF is fixed to a canonical version which has been 
fitted on visual areas [4]. The detection task is performed by a General Linear Model (GLM), where 
stimulus-induced components are assumed to be known and only their relative weighting are to be 
recovered in the form of effect maps [5]. However, spatial intra-subject and between- subject variability 
of the response function has been highlighted [6,7], in addition to potential timing fluctuations induced 
by the paradigm (eg variations in delay). To take this variability into account, more flexibility can be 
injected in the GLM framework by adding more regressors. In a parametric setting, this amounts to 
adding a function basis, such as canonical HRF derivatives or a set of gamma functions. In a non- 
parametric setting, all HRF coefficients are explicitely encoded as a Finite Impulse Response [8]. The 
major drawback of these GLM extensions is the multiplicity of regressors for a given condition, so that 
the detection task is more difficult to perform and that statistical power is decreased. Moreover, the more 
coefficients to recover, the more ill-posed the problem becomes. The alternative approaches which aim 
at keeping a single regressor per condition and add also a temporal regularisation constraint to fix the 
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ill-posedness are the so-called regularized FIR methods [9, 10]. Still, they do not overcome the low signal- 
to-noise ratio inherent to BOLD signals, and they lack robustness especially in non-activating regions. 
All the issues encountered in the previously mentioned approaches are linked to the sequential treatment 
of the detection and estimation tasks. Indeed, these two problems are strongly linked: on the one hand, 
a precise localization of brain activated areas strongly depends on a reliable HRF model; on the other 
hand, a robust estimation of the HRF is only possible in activated areas where enough relevant signal is 
measured [11]. This consideration led to jointly perform these two tasks [12, 13] in a Joint Detection- 
Estimation (IDE) framework [14, 15] which is the basis of the approach developed in this paper. To 
improve the estimation robustness, a gain in HRF reproducibility is performed by spatially aggregating 
signals so that a constant HRF shape is locally considered across a small group of voxels, i.e. a region 
or a parcel. The procedure then implies a partitioning of the data into functionally homogeneous parcels, 
in the form of a cerebral parcellation. In brief, the IDE approach mainly rests upon: i) a non-parametric 
or FIR modeling of the HRF at this parcel-level for an unconstrained HRF shape; ii) prior information 
about the temporal smoothness of the HRF to guarantee a physiologically plausible shape; and Hi) the 
modeling of spatial correlation between neighboring voxels within each parcel using condition- specific 
discrete hidden Markov fields. In [15, 16], posterior inference is carried out in a Bayesian setting using a 
Markov Chain Monte Carlo (MCMC) method, which is computationally intensive and requires the fine 
tuning of several parameters. 

In this paper, we reformulate the complete IDE framework [15] as a missing data problem and 
propose a simplification of its estimation procedure. We resort to a variational approximation using 
a Variational Expectation Maximization (VEM) algorithm in order to derive estimates of the HRF and 
stimulus-related activity. Variational approximations have been widely and successfully employed in the 
context of fMRI analysis: to model auto-regressive noise in the context of a Bayesian GLM [17], to 
characterize cerebral hierarchical dynamic models [18], to model transient neuronal signals in a Bayesian 
dynamical system [14], or to perform inference of spatial mixture models for the segmentation of GLM 
effects maps [19]. As in our study, the primary objective of resorting to variational approximations is to 
alleviate the computational burden associated with stochastic MCMC approaches. Akin to [19], we aim 
at comparing the stochastic and variational based inference schemes but on the more complex matter 
of detecting activation and estimating the HRF whereas [19] treated only a detection (or segmentation) 
problem. 

Compared to a IDE MCMC implementation, the proposed approach does not require priors on the 
model parameters for inference to be carried out. However, for more robustness and to make the proposed 
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approach completely auto-calibrated, the adopted model may be extended by injecting additional priors 
on some of its parameters. 

Experiments on artificial and real data demonstrate the good performance of our VEM algorithm. 
Compared to the MCMC implementation, VEM is more computationally efficient, is robust to mis- 
specification of the parameters, to deviations from the model and adaptable to various experimental 
conditions. This increases considerably the potential impact of the JDE framework and makes its ap- 
plication to fMRI studies in neuroscience easier and more valuable. This new framework has also the 
advantage of providing straightforward criteria for model selection. 

The rest of this paper is organized as follows. In Section II, we introduce the hierarchical Bayesian 
model for the JDE framework in the within- subject fMRI context. In Section III, the VEM algorithm 
based on variational approximations for inference is described. Evaluation on real and artificial fMRI 
datasets are reported in Section IV and the performance comparison between the MCMC and VEM 
implementations is reported in Section V. Finally, Section VI concludes with a discussion of the points 
highlighted by the approach and areas for further research. 

II. Bayesian framework for the joint detection-estimation 

Matrices and vectors are denoted with bold upper and lower case letters (e.g. P and /x). A vector is 
by convention a column vector. The transpose is denoted by Unless stated otherwise, subscripts j, m, 
i and n are respectively indexes over voxels, stimulus types, mixture components and time point. The 
Gaussian distribution with mean /x and covariance matrix E is denoted by A/'(/x, E). 

A. The parcel-based model 

We first recast the parcel-based JDE model of [15, 16] in a missing data framework. Let us assume that 
the brain is decomposed in P = (V^)^=i:t parcels, each of them having homogeneous hemodynamic 
properties. The fMRI time series yj is measured in voxel j G at times {tn)n=i...N^ where tn = nTR, 
N being the number of scans and TR, the time of repetition. The number of different stimulus types 
or experimental conditions is M. For a given parcel containing a group of connected voxels, a 
unique BOLD signal model is used in order to link the observed data Y = {yj G M^, j G P^} to 
the HRF G M^"^^ specific to and to the response amplitudes A = {a"^,m = 1...M} with 
= {a^^j G V^} and being the magnitude at voxel j for condition m. More specifically, the 
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observation model at each voxel j e is expressed as follows [16]: 

M 

y- = Sjh^ + P£j + Sj, with Sj = ^ afXm (1) 

where Sjh^ is the summation of the stimulus-induced components of the BOLD signal. The binary 
matrix Xm = n = 1 ... N,d = ... D} is of size N X (D + 1) and provides information on 

the stimulus occurrences for the m-th experimental condition, At < TR being the sampling period of 
the unknown HRF = {hdAt^ d = . . .D} in Vj. This hemodynamic response is a consequence of 
the neuronal excitation which is commonly assumed to occur following stimulation. The scalars aj^'s 
are weights that model the transition between stimulations whose occurrences are informed by the Xm 
matrices (m = . . . M), and the vascular response informed by the filter hj. It follows that the aj^'s 
are generally referred to as Neural Response Levels (NRL). The rest of the signal is made of matrix 
P, which corresponds to physiological artifacts accounted for via a low frequency orthonormal function 
basis of size N xO. At each voxel j is associated a vector of low frequency drifts £j G which has to 
be estimated. Within parcel Vj, these vectors may be grouped into the same matrix L = {^j, j ^ V^}- 
Regarding the observation noise, the e/s are assumed to be independent with sj ^ A/'(0, TJ^) at voxel j 
(see Section II-Bl for more details). The set of all unknown precision matrices (inverse of the covariance 
matrices) is denoted by T = {Tj^j G Vj}. 

Finally, detection is handled through the introduction of activation class assignments 
Q = {g"^,m = 1...M} where q'^ = {qj^^j G V^} and qj^ represents the activation class at voxel j 
for experimental condition m. The NRL coefficients will therefore be expressed conditionally to these 
hidden variables. In other words, the NRL coefficients will depend on the activation status of the voxel 
j, which itself depends on the activation status of neighbouring voxels thanks to a Markov model used 
as a spatial prior on Q (cf Section II-B2c). Without loss of generality, here the number of classes is 
/ = 2 for activated and non-activated voxels. An additional deactivation class (/ = 3) may be considered 
depending on the experiment. In the following developments, all provided formulas are general enough 
to cover this case. 

B. A hierarchical Bayesian Model 

In a Bayesian framework, we first need to define the likelihood and prior distributions for the model 
variables (Y^A^h^^Q) and parameters (0). Using the hierarchical structure between Y, A, h^, Q 
and 0, the complete model is given by the joint distribution of both the observed and unobserved (or 
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missing) data: p{Y, A, h^, Q; 0) = p{Y \ A, hy, 0) &)p{A \ Q; 0) 0). To fully define the 

hierarchical model, we now specify each term in turn. 

1) Likelihood: 

The definition of the likelihood depends on the noise model. In [20,21], an autoregressive (AR) noise 
model has been adopted to account for serial correlations in fMRI time series. It has also been shown 
in [21] that a spatially-varying first-order AR noise model helped controlling false positive rate. In the 
same context, we will assume such a noise model sj ~ J\f{O^Tj^) with Tj = crj^-^j where Aj is a 
tridiagonal symmetric matrix which depends on the AR(1) parameter pj [16]: (Aj)i^i = {Aj)^^^ = 1, 
(A^)^,^ = 1 + for n = 2 : - 1 and {Aj)n+i,n = {Aj)n,n+i = -pj for n = 1 : N - 1. 
These parameters are assumed voxel- varying due to their tissue-dependence [22,23]. The likelihood can 
therefore be decomposed as: 

p{Y\A,hY,L,T)^ II \rj\-'/^exp{-^y)Tjyj), (2) 

where |rj| = l^jl' l^jl ~ ^ ~ Pj Vj ~ Vj ~ -^^j ~ ^jhj. 

2) Model priors: 

a) HRF: 

Akin to [15, 16], we introduce constraints in the HRF prior that favor smooth variations in by 
controlling its second order derivative: ~ J\f{O^VhR) withi? = (At)^ (^2^2)"^ where D2 is 
the second-order finite difference matrix and Vh is a parameter to be estimated. Moreover, boundary 
constraints have also been fixed on hj as in [15, 16] so that ho = ho At = 0. 

b) Neuronal response levels: 

Akin to [15, 16], the NRLs are assumed to be statistically independent across conditions: p{A; 6a) = 
Y{p{^^] ^m) where Oa = {^m, m = 1 . . . M} and 0^ gathers the parameters for the m-th condition. A 

m 

mixture model is then adopted by using the assignment variables to segregate non-activated voxels 
{q^ = 1) from activated ones {q^ = 2). For the m-th condition, and conditionally to the assignment 
variables g"^, the NRLs are assumed to be independent: p{a^ \ q^; 6m) = Yl p{oJp I Q^'^ ^m)- H qj^ = i 
then p{a'p\qf = i;Om) ^ J^il^im.Vim)- The Gaussian parameters Om = {l^im,Vim,i = 1.../} are 
unknown. We denote by ji ={ii^^m = 1...M} with ji^ = {//im? • • • ? Wm} and v = {i;^,m = 
1...M} with Vm = {^im, • • • More specifically, for non- activating voxels we set for all m, 

c) Activation classes: 

As in [15], we assume prior independence between the M experimental conditions regarding the 
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activation class assignments. It follows that p{Q) = H P{Q^'^Pm) where we assume in addition that 
p{q^] f3m) is a spatial Markov prior, namely a Potts model with interaction parameter f3m [15]: 

p{q"';/3m) = Z{M-'eM^mU{q^)) with = J] 0) 

and where Z{/3m) is the normalizing constant and for all (a, 6) G , 5(a, 6) = 1 if a = 6 and 
otherwise. The notation j ^ k means that the summation is over all neighboring voxels. Moreover, the 
neighboring system may cover a 3D scheme through the brain volume. The unknown parameters are 
denoted by /3 = {^5^, m = 1 . . .M}. In what follows, we will consider a 6-connexity 3D neighboring 
system. 

For the complete model, the whole set of parameters is denoted by© = {T, /jL^v^Vh^ 13} and belong 
to a set 0. 

III. Estimation by variational Expectation-Maximization 

We propose to use an Expectation-Maximization (EM) framework to deal with the missing data 
namely, A ^ A, ^ Q ^ Q. Let V be the set of all probability distributions on A x H x Q, 
EM can be viewed [24] as an alternating maximization procedure of a function on D, 0) = 
Ep[logp(y, A, I 0)] + G{p) where Ep[.] denotes the expectation with respect to p and G{p) = 
—Ep[\ogp{A^hj^Q)] is the entropy of p. At iteration (r), denoting the current parameter values by 
0^'^"^^ the alternating procedure proceeds as follows: 

E-step: Q = argmax T{p,&^''-^^) (4) 

' ^ pev 

M-step: 0(^) = argmax J^b?^ 0) (5) 
bg© ' ^ 

The optimization step in Eq. (4) leads to p^^'^jj^ q = p{A^ h^^ Q \ Y ^ 0^'^"^^), which is intractable for 
our model. Hence, we resort to a variational EM variant in which the intractable posterior is approximated 
as a product of three pdfs on A, H and Q respectively. 

Previous attempts to use variational inference [25] in fMRI [17, 19] have been successful with this type 
of approximations usually validated by assessing its fidelity to its MCMC counterpart. In Section IV, 
we will also provide such a comparison. The fact that the HRF can be equivalently considered as 
missing variables or random parameters induces some similarity between our Variational EM variant 
and the Variational Bayesian EM algorithm presented in [25]. Our framework varies slightly from the 
case of conjugate exponential models described in [25] and more importantly, our presentation offers the 



February 8, 2012 



draft 



8 



possibility to deal with extra parameters for which prior information may not be available. 
We propose here to use an EM variant in which the intractable E-step is instead solved over V, a restricted 
class of probability distributions chosen as the set of distributions that factorize as pa,h^,q = PaPh^Pq 
where pA, Ph^ and pQ are probability distributions on A, % and Q, respectively. It follows then that our 
E-step becomes an approximate E-step, which can be further decomposed into three stages that consist 
of updating the three pdfs, ph^^ Pa and pg, in turn using three equivalent expressions of when p 
factorizes as in V. At iteration (r) with current estimates denoted by Ph~^\pa~^^ ^Pq~^^ 0^^"^^ 
the updating rules become: 



E-H: p"^ =argmaxJ'(^ ^^h^Pq ^\®^'^ 
E-A: # =argmaxJ'(pA# 

Pa ^ 

E-Q: pg) = argmax^(^;) pj^ PqM'-'^ 

Pq ^ 

Introducing the KuUback-Leibler divergence between pa,h^,q and pa,h^,q , we have 

'D{pA,H,,Q \\pa,h,,q) = [ Pa,h,,q{A, Q) log ^^■^-■Q^^' ^ dA dh^ dQ. (6) 

According to [24], we also have J^{pa,h^,q, 0) = logp{Y] @) -V{pa,h^,q \ \pa,h^,q) so that the steps 
above can be equivalently written in terms of minimizations of the KuUback-Leibler divergence. The 
properties of the latter lead to the following solutions: 

E-H: ^Hlih^) oc exp (E^.-^,^,.-^, [logp{h^ \ Y, A, Q; 0(^-i)]) (7) 

E-A: #(A) oc exp (e^^^)^j-i) [logp{A \ Y, h^, Q; 0(^-i))]) (8) 

E-Q: pg)(g)ocexp(E^o^o [logp(Q|l^,A,/i^^ . (9) 
The corresponding M-step is (since and G{Pa^h q) are independent): 

M: 0W = arg max E^(.)^^.)^^.) [logp(l^, A, h^, Q; 0)] . (10) 

These steps lead to explicit calculations for p]q and the parameter set 



0W = {rW,LW,/xW,t;('-),4''\/3W}. 



• E-H step: From Eq. (7) standard algebra enables to derive that is a Gaussian distribution 
~ M{rrip , ) whose parameters are detailed in Appendix A. The expressions for rn^ and 
S)^^ are similar to those derived in the MCMC case [16, Eq. (B.l)] with expressions involving the 
aj^'s replaced by their expectations wrt 
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E-A step : Using Eq. (8), standard algebra rules allow to identify the Gaussian distribution of 
which writes as = Y\jev P^A- "^^^ Pa^- ^ -^(^a^ ^^a )- More detail about the update of 
is given in Appendix B. The relationship with the MCMC update of A is not straightforward. In 
[15, 16], the aj^'s are sampled independently and conditionally on the gj^'s. This is not the case in 
the VEM framework but some similarity appears if we set the probabilities 'PQ^^\i) either to or 

ir) 

1 and consider only the diagonal part of 

E-Q step: Using the expressions of p{A\Q) and p{Q) in Section II, Eq. (9) yields Pq\q) = 
n PQrn{q^) which is intractable due to the Markov prior. To overcome this difficulty, a number 

m=l 

of approximation techniques are available. To decrease the computational complexity of our EM 
algorithm or to avoid introducing additional variables as done in [19], we use a mean-field like 
algorithm which consists of fixing the neighbours to their mean value. Following [26], PQm{q^) 



can be approximated by a factorized density PnLiq^) = Yl PnmiQj^) such that if qj 



(z) oc Ar(mW ; /.;::^\ ^t"' V(^r = ^ I ^^^i; ^r^^)^ (n) 

where is a particular configuration of q^ updated at each iteration according to a specific scheme, 
~ j denotes neighbouring voxels to j on the brain volume and f{q^ = i \ q^-] (3m~^\vm~^^) oc 

(r) 

exp{ +/3m z2 ^{i^^^)}' Hereabove, m\L and > denote the m and (m^m ) entries 

^im J^^j A. A. 

ir) ir) 

of the mean vector {rn\\) and covariance matrix {^\.), respectively. The Gaussian distribution with 
mean /i^^ and variance is denoted by J\f{ . ; /izm, ^zm). while q^- = {q^-, k ^ j}. More details 
are given in Appendix C. 

M step: For this maximization step, we can first rewrite Eq. (10) as 



arg max 





^p^I¥h\ I ^' ^^75 r)] + E^o^o [logp{A I Q; /x, v)] 



+ E^^.) [logp{hY,Vh)] +E^r, [logp(Q;/3)]]. (12) 

The M-step can therefore be decoupled into four sub-steps involving separately (L^T), (/x, v), 
and f3. Some of these sub-steps admit closed-form expressions, while some other require resorting to 
iterative or alternate optimization. For more details about related calculations, the interested reader 
can refer to Appendix D. 

IV. Validation of the proposed approach 

This section aims at validating the proposed variational approach. Simulated and real contexts are 
considered respectively in sub-sections IV- A and IV-B. To corroborate the effectiveness of the proposed 
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method, comparisons with its MCMC counterpart will also be conducted throughout the present section. 

A. Artificial fMRI datasets 

In this section, experiments have been conducted on data simulated according to the observation model 
in Eq. (1) where P has been defined from a cosine transform basis as in [16]. The simulation process is 
illustrated in Fig 1. 




scan number Time in s scan number scan number 

Fig. 1. From left to right: Sj represents the experimental conditions for voxel j as the scan number increases. 

They are convoluted with the hemodynamic model of parcel V^', Sj + Plj represents the noise and artifact 

components for voxel j with respect to the scan number and yj is the final fMRI time series simulated at voxel j. 

Different studies have then been conducted in order to validate the detection-estimation performance 
and robustness. For each of these studies, some simulation parameters have been changed such as the 
noise or the paradigm properties. Changing these parameters aims at providing for each simulation context 
a realistic BOLD signal. 

1) Detection-Estimation performance: 
Data have been simulated here with a Gaussian white noise FJ^ = 1.2 /at {In is the x identity 
matrix). Two experimental conditions have also been considered (M = 2) while ensuring stimulus-varying 
Contrast-to-Noise Ratios (CNR)^ achieved by setting /ii2 = 2.8,i;i2 = 0.5^ and /X22 = l.S^V22 = 0.5^ 
so that higher CNR is simulated for the first experimental condition (m = 1) compared to the second 
one (m = 2). For each of these conditions, the initial artificial paradigm comprised 30 stimulus events. 
The simulation process finally yielded time-series of 268 time-points. Condition- specific activating and 
non- activating voxels were defined as 20x20 2D slices as shown in Fig. 2[left], respectively. 

The posterior probability maps (PPM) obtained using VEM and MCMC are shown in Fig. 2 [middle] 
and Fig. 2[right]. PPMs here correspond to the activation class assignment probability. These figures 

^For two Regions of Interest (ROI), CNR = 2(/ii — 112)^ I {v\ + V2) where (/ii, vi) (resp. (/j.2,V2)) are the intensity mean 
and variance within ROI 1 (resp. ROI 2). 
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clearly show the gain in robustness provided by the variational approximation. This gain consists of 
lower miss-classification noise (a lower false positive rate) illustrated by higher PPM values, especially 
for the experimental condition with the lowest CNR (m = 2). 

For a quantitative evaluation, the ROC curves corresponding to the estimated PPMs using both algorithms 
were computed. They are reported in Fig. 3 and confirm that the proposed VEM approach outperforms 
the MCMC implementation for the second experimental condition (m = 2). Conversely, for the higher 
CNR (m = 1), the MCMC approach performs sUghtly better. 



m — 1 



m — 2 



- - MCMC 
VEM 



C/2 0.8 



^ 0.8 \t 



MCMC 
VEM 



0.0 0.2 0.4 0.6 0.8 1.0 



0.0 0.2 0.4 0.6 0.8 1.0 



1 -specificity 1 -specificity 

Fig. 3. ROC curves associated with the label estimates using VEM and MCMC. Condition m — 1 presents higher CNR than 
condition m = 2. Curves are plotted in solid and dashed line for VEM and MCMC, respectively. 



Fig. 4 shows the NRL estimates obtained by the two methods. Although some differences are exhibited 
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on the PPM, both algorithms report similar qualitative results. However, Fig. 4[right] shows the difference 
between NRL estimates (VEM-MCMC). It is worth noticing in this figure that regions corresponding 
to activated areas for the two conditions present positive intensity values, which shows that VEM helps 
retrieving stronger NRL values for activated area compared to MCMC. Quantitatively speaking, the 
gain in robustness is confirmed by reporting the Mean Square Error (MSE) values on NRL estimates 
which are slightly lower using VEM compared to MCMC for the first experimental condition (m = 1: 
MSEmcmc = 0.012 vs. MSEvEM = 0.010), as well as for the second experimental condition (m = 2: 
MSEmcmc = 0.010 vs. MSEvEM = 0.009). These error values indicate that, even though the MCMC 
algorithm gives the most precise PPMs for the high CNR condition (Fig. 3, m = 1), the VEM approach is 
more robust than its MCMC alternative in terms of estimated response levels. These values also indicate 
slightly lower MSE for the second experimental conditions (m = 2) compared to the first one (m = 1) 
with higher CNR. This difference is due to the presence of larger non-activated area used for m = 2 
where low NRL values are simulated, and for which MSE is very low. 



Ground Truth 



MCMC 



m — 1 



m = 2 




VEM-MCMC 




Fig. 4. Ground truth and NRL estimates by MCMC and VEM (first three images), and difference NRL image (right). 



As regards HRF estimates. Fig. 5 shows both retrieved shapes using MCMC and VEM. Compared 
to the ground truth curve (solid line), the two approaches give very similar results and preserve the 
most important features of the original HRF like the peak value (PV), time-to-peak (TTP) and time-to- 
undershoot (TTU). 

2) Estimation robustness: 

Since estimation errors may be caused by several sources of perturbation {e.g. noise level, stimulus 
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density,...), several experiments have been conducted with different values of simulation parameters. 

a) Varying the stimulus density: 
In this experiment, several simulations have been generated using different stimulus densities in the 
artificial paradigm (from 5 to 30 stimuli), which leads to different Inter Stimuli Intervals (ISI) (from 
47 s to 9 s). Note here that the stimuli are interleaved between the two conditions so that the above- 
mentioned ISIs correspond to the time interval between two events irrespective to the condition they 
belong to. A second order autoregressive noise (AR(2)) has also been used since it has been reported in 
the literature that such a model provided realistic BOLD signal [21]. The rest of the model is specified as 
before. In order to quantitatively evaluate the robustness of the proposed VEM approach to input Signal 
to Noise Ratio (SNR)^, results (assuming white noise for both algorithms) are compared while varying 
the stimulation rate during the BOLD signal acquisition. Fig. 6 illustrates the MSB evolution related to 
the NRL estimates for both experimental conditions wrt the ISI (or equivalently the stimulus density) 
in the experimental paradigm. This figure shows that at low SNR level, i.e. high ISI (or low stimulus 
density), and for both conditions, VEM is more robust to model mispecification. In contrast, at low ISI 
(i.e. high stimulus density), the two methods perform similarly and remain quite robust. It should be 
noted here that, as reported in Section IV-Al, the error values on NRL estimates remain comparable for 
both experimental conditions and for all ISI values, although PPM results may present some imprecisions 
for the low CNR condition. Results shown in Fig. 6 were obtained over 100 simulations to investigate the 
NRL estimation MSB variance through runs. It can therefore be noticed through this figure that, for the 
two experimental conditions, higher MSB variance (larger error bars) is obtained with MCMC compared 

^The SNR is given by: SNR = 10 log ^ WSjh^W^/ ^ 
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to VEM. Moreover, estimation variance across simulations (error bars) increase with ISI as expected. 
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Fig. 6. MSB evolution of NRL estimates wrt Inter Stimuli Interval (ISI) for experimental conditions m — \ and m — 2. 
Vertical bars represent empirical standard deviations computed over 100 simulations. 



As regards hemodynamics properties, Fig. 7 [left] depicts the MSB on HRF estimates inferred by the 
VEM and MCMC wrt the ISI (or equivalently the stimulus density). The VEM approach performs better 
than the MCMC one especially at low stimulus density (high ISI values), but the error level remains 
relatively low for both methods. When evaluating the estimation robustness of the key HRF features (PV, 
TTP and TTU), it turns out that the TTP and TTU estimates remain the same irrespective to the inference 
algorithm, which corroborates the robustness of the developed approach (results not shown). As regards 
PV estimates. Fig. 7 [right] shows the MSE values wrt the ISI. The VEM algorithm still performs better 
than the MCMC one mainly at high ISI values (low stimulus density). For more complete comparisons, 
similar experiments have been conducted while changing the ground truth HRF properties (PV, TTP, 
TTU), and coherent results have been obtained. 

h) Varying the noise parameters: 
In this experiment, several simulations have been conducted using an AR(2) noise with different variance 
and correlation parameters in order to illustrate the robustness of the proposed VEM approach to noise 
parameter fluctuation. In Fig. 8, MSE on NRL estimates is plotted against the input SNR when varying 
the noise variance (Fig. 8[top]) and its amount of autocorrelation (Fig. 8[bottom]). In the latter case, the 
AR parameters are changed while maintaining a stable AR(2) process. As already observed in [27], at a 
fixed input SNR value, the impact of large autocorrelation is stronger than that of large noise variance 
irrespective of the inference scheme. At low input SNR (as usually observed on real BOLD signals), this 
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feature is mainly shown through Figs. 8 [bottom] -[top]. Moreover, although a slight advantage is observed 
for the proposed VEM approach in terms of MSE and for both experimental conditions, the two methods 
perform generally well with a relatively low error level. The illustrated results were obtained over 100 
simulations in order to investigate the estimation error variance (vertical bars in Fig. 8). These error bars 
show that changing the amount of correlation (Fig. 8[top]) induces lower variance across simulations 
than when changing the noise variance (Fig. 8[bottom]). 

c) Varying the spatial regularisation parameter: 
This section is dedicated to studying the robustness of the spatial regularisation parameter estimation. 
When positive, this parameter favors spatial regularity across adjacent voxels, and hence smoother activa- 
tion maps. Fig. 9 shows the estimated mean value and standard deviations for /3 over 100 simulations using 
both algorithms and for the two experimental conditions. Three main regions can be distinguished for 
both experimental conditions. The first one corresponds to /3 valuer lower than 0.8, which approximatively 
matches the phase transition critical value = log(l + \/2) = 0.88 for the 2-class Potts model. For this 
region. Fig. 9 shows that the VEM curve (green line) appears to be closer to the Ground truth (black 
line) than the MCMC curve (blue line). Also, the proposed VEM approach gives more precise estimation, 
especially for the first experimental condition (m = 1) having relatively high CNR. The second region 
corresponds to (3 values between 0.8 and 1.1, where MCMC becomes more robust than VEM. Beyond 
/3 = 1.1, we can identify the third region where both methods give less robust estimation than the first 
two regions. Based on these regions, we conclude that the mean-field variational approximation improves 
the estimation performance up to a given critical value. It turns out that such an approximation is more 
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valid for low beta values, which usually correspond to (3 values observed on real BOLD fMRI data. 
When comparing estimates for the two conditions, the curves in Fig. 9 show that both methods generally 
estimate more precise /3 values for the first experimental condition (m = 1) having higher input CNR. 
For both cases, and across the three regions identified hereabove, the error bars show that the VEM 
approach generally gives less scattered estimates (lower standard deviations) than the MCMC one, which 
confirms the gain in robustness induced by the variational approximation. 

Note here that estimated /3 values in the experiment of Section IV- A 1 lie in the first region for the 
first experimental condition (/3f^^^^ = 0.74 and (3^^^ = 0.75). For the second condition (m = 2), 
and because input SNR is relatively low, no clear conclusion can be made since MCMC and VEM give 
relatively different values (/3^^^^ = 0.61 and /32^^ = 1.01) and no ground truth is available since the 
activation maps have been drawn by hand and not simulated according to the Markov model. 
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B. Real fMRI datasets 

This section is dedicated to the experimental vaUdation of the proposed VEM approach in a real 
context. Experiments were conducted on real fMRI datasets collected on healthy adult subjects who gave 
informed written consent. Data were collected with a 3-Tesla Siemens Trio scanner using an MPRAGE 
sequence for the anatomical MRI and a Gradient-Echo Echo Planar Imaging (GRE-EPI) sequence for the 
fMRI experiment. The acquisition parameters for the MPRAGE sequence were set as follows: Time of 
Echo: TE = 2.98ms; Time of Repetition: TR = 2300ms; sagittal orientation; spatial in-plane resolution: 
1 X Imm^; Field of View: FOV = 256mm^ and slice thickness: 1.1mm. Regarding the EPI sequence, 
we used the following settings: the fMRI session consisted of = 128 EPI volumes, where each 
scan was acquired using TR — 2400ms, TE — 30ms, slice thickness: 3mm, transversal orientation, 
FOV = 192mm^ and spatial in-plane resolution was set to 2 x 2mm^. Data was collected using a 32 
channel head coil to enable parallel imaging during the EPI acquisitions. Parallel SENSE imaging was 
used to keep a reasonable Time of Repetition (TR) value in the context of high spatial resolution. 
For the fMRI experiment, a functional localizer paradigm [28] was used, that enables to quickly map cog- 
nitive brain functions such as reading, language comprehension and mental calculations as well as primary 
sensory-motor functions. It consists of a fast event-related design comprising sixty auditory, visual and 
motor stimuli, defined in ten experimental conditions and divided in two presentation modalities (auditory 
and visual sentences, auditory and visual calculations, left/right auditorily and visually induced motor 
responses, horizontal and vertical checkerboards). The average ISI is 3.75 s including all experimental 
conditions. After standard pre-processing steps (slice-timing and motion corrections, normalization to the 
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MNI space), the whole brain fMRI data was first parcellated into F = 600 functionally homogeneous 
parcels by resorting to the approach described in [29]. It consisted of a spatially constrained hierarchical 
clustering (Euclidean distance, Ward's linkage) of functional features extracted via a classical GLM 
analysis. This parcellation was used as input of the JDE procedure, together with the fMRI time series. 
We stress the fact that the latter signals were not spatially smoothed prior to the analysis as opposed to 
the classical SPM-based fMRI processing. In what follows, we compare the MCMC and VEM versions 
of JDE by focusing on two contrasts of interest: i) the Visual-Auditory (VA) contrast which targets 
positive and negative evoked activity in the primary occipital and temporal cortices, respectively, and ii) 
the Computation-Sentences (CS) contrast which aims at highlighting higher cognitive brain functions. 
Besides, results on HRF estimates are reported for the two JDE versions and compared to the canonical 
HRF, as well as maps of regularisation factor estimates. 

Fig. 10 shows results for the VA contrast. High positive values are bilaterally recovered in the occipital 
region and the overall cluster localizations are consistent for both MCMC and VEM algorithms. The 
only difference lies in the temporal auditory regions, especially on the right side, where VEM yields 
rather more negative values than MCMC. VEM seems thus more sensitive than MCMC. The bottom 
part of Fig. 10 compares the estimated values of the regularisation factors /3 between VEM and MCMC 
algorithms for two experimental conditions involved in the VA contrast. Since these estimates are only 
relevant in parcels which are activated by at least one condition, a mask was applied to hide non-activated 
parcels. We used the following criterion to classify a parcel as activated: max{(/ii^)i<^<M} > 8 (and 
non-activated otherwise). These maps of /3 estimates show that VEM yields more contrasted values 
between the visual and auditory conditions. To be more precise. Table I provides the estimated /3 values 
in the highlighted parcels of interest. The auditory condition is not active and yields lower /3 values in 
both parcels whereas the visual condition is associated with higher values. The latter comment holds for 
both algorithms but VEM provides much lower values (|^^^EM ^ 0.01) than MCMC (/3mcmc ^ 1-07) 

^vis. 

for the inactive condition. For the active condition, the situation is comparable, with /3yem ^1-1 and 

^vis. 

/^MCMC ^ 1-25. We illustrate here a noteworthy difference between VEM and MCM and state the impact 
of the mean field and variational approximations, so that the hidden field has not the same behaviour 
between the two algorithms. Still, this discrepancy is not visible on the NRL maps. 
Fig. 10[a-b] depicts HRF estimation results which are rather close for both methods in the two regions 
under consideration. VEM and MCMC HRF estimations are also consistent with the canonical HRF shape. 
Indeed, the latter has been precisely calibrated on visual regions [4]. We can note a higher variability 
in the undershoot part, which can be explained first by the event-related nature of the paradigm where 
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successive evoked responses are likely to overlap so that it is more difficult to disentangle their ends and 
second by the signal strength which is inevitably low in the tail of the response. To conclude on the VA 
contrast which focused on well-known sensory regions, VEM provides sensitive results consistent with 
the MCMC version, both wrt detection and estimation tasks. 

(a) 
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Fig. 10. Results for the Visual-Auditory contrast obtained by the VEM and MCMC JDE versions. Top left column: contrast 
maps for MCMC, top middle column: contrast maps for VEM, with sagittal, coronal and axial views from top to bottom lines 
(neurological convention, left is left). On the top right part: plots of HRF estimates for VEM and MCMC in the two regions 
circled in indigo and magenta on the maps: occipital left (a) and right (b), respectively. The canonical HRF shape is depicted 
in dashed line. The bottom part shows axial maps of estimated regularisation factors /3 for two conditions, auditory (aud.) and 
visual (vis.), involved in the VA contrast. Parcels that are not activated by any condition are hidden. For all maps, the input 
parcellation is superimposed in white contours. 
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Results related to the Computation-Sentences (CS) contrast are depicted in Fig. 11. As for VA, 

contrast maps are roughly equivalent for VEM and MCMC in terms of cluster localizations. Still, we 
observe that MCMC seems quite less specific than VEM as positive values are exhibited in the white 
matter for MCMC, and not for VEM (compare especially the middle part of the axial views). For the 
estimates of the regularisation factors, the situation is globally almost the same as for the VA contrast, 
with VEM yielding more contrasted /3 maps than MCMC. However, these values are slightly lower than 
the ones reported for the VA contrast. 

We first focus on the left frontal cluster, located in the middle frontal gyrus which has consistently been 
exhibited as involved in mental calculation [30]. HRF estimates in this region are shown in Fig. 11 [(b)] 
and strongly departs from the canonical version. Especially, the TTP value is much more delayed with 
JDE (7.5 s), compared to the canonical situation (5 s). The VEM and MCMC shapes are close to each 
other, except for the beginning of the curves where VEM presents an initial dip. This might be interpreted 
as a higher temporal regularisation for the MCMC version. Still, the most meaningful HRF features such 
as the TTP and the Full Width at Half Maximum (FWHM) are very similar. 

The second region of interest for the CS contrast is located in the inferior parietal lobule and is also 
consistent with the computation task [30]. Note that the contrast value is lower than the one estimated 
in the frontal region. Results for the regularisation factors, as shown in Table I [4^^ col.], indicate that 
/3 for VEM and the Sentence condition (i.e, /Sy^^) is not as low as it was for the other parcels and 
the inactive conditions (1.19 against 0.01). This is due to the fact that both the Computation and the 
Sentence conditions yield activations in this parcel, which is confirmed by the low contrast value. 
HRF estimates are shown in Fig. 11 [(a)]. The statement relative to the previous region holds again: they 
strongly differ from the canonical version. When comparing MCMC and VEM, even if the global shape 
and the TTP position are similar, the initial dip is still stronger with VEM and the corresponding FWHM 
is also smaller than for the MCMC version. As previously mentioned, this suggests that MCMC may 
tend to over-smooth the HRF shape. 

The studied contrasts represent decreasing CNR situations, with the VA contrast being the stronger and 
CS the weaker. From the detection point of view, the contrast maps are very similar for both JDE versions 
and this result is only dimly affected by the CNR variation. In contrast, HRF estimation results are much 
more sensitive to this CNR variation, with stronger discrepancies between the VEM and MCMC versions, 
especially for the HRF estimates associated with the CS parietal cluster. The latter shows the weaker 
contrast amplitude. Still, both versions provide results in agreement on the time-to-peak and FHWM 
values. Indeed, the differences mainly concern the heading and tailing parts of the HRF curves. 
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Fig. 11. Results for the Computation- Sentences contrast obtained by the VEM and MCMC JDE versions. Top left column: 
contrast maps for MCMC, top middle column: contrast maps for VEM, with sagittal, coronal and axial views from top to 
bottom lines (neurological convention, left is left). On the top right part: plots of HRF estimates for VEM and MCMC in the 
two regions circled in indigo and magenta on the maps: left parietal lobule (a) and left middle frontal gyrus (b), repectively. 
The canonical HRF shape is depicted in dashed line. The bottom part shows axial maps of estimated regularisation factors /3 
for two conditions, computation (comp.) and sentence (sent.), involved in the CS contrast. Parcels that are not activated by any 
condition are hidden. For all contrast maps, the input parcellation is superimposed in white contours. 



V. Algorithmic efficiency 

In this section, the computational performance of the two approaches is compared on both artificial and 
real fMRI datasets. Both algorithms were implemented in Python and fully optimized by resorting to the 
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TABLE I 

Comparison between JDE VEM and MCMC on the estimated regularisation parameters f3 for the 
experimental conditions involved in the studied contrasts: visual- auditive (va) and 
Computation-Sentences (CS). Results are provided for the two highlighted parcels for each contrast 

(SEE Figs. 10 and 11). 





VA contrast 


CS contrast 




parcel 7^^^ 


parcel 7^^ 


parcel 7P 


parcel 7P 




Vis. Aud. 


Vis. Aud. 


Comp. Sent. 


Comp. Sent. 


MCMC 


1.28 1.08 


1.24 1.05 


1.08 0.68 


1.07 0.64 


VEM 


1.14 0.01 


1.08 0.01 


0.91 0.01 


0.82 1.19 



efficient array operations of the Numpy library ^ as well as C-extensions for the computationally intensive 
parts (eg, NRL sampling in MCMC or the E-Z step for VEM). Moreover, our implementation handled 
distributed computing resources as the JDE analysis consists of parcel-wise independent processings 
which can thus be performed in parallel. This code is available in the PyHRF package ^. 
For both the VEM and MCMC algorithms, the same stopping criterion is used. This criterion consists 
of simultaneously evaluating the online relative variation of each estimate. In other words, for instance 

for the estimated h^, one has to check whether ch = < 10 ^. By evaluating a similar 

^ ll^Vlli 

criterion ca for the NRLs estimates, the algorithm is finally stopped once ch < 10~^ and ca < 10~^. 
For the MCMC algorithm, this criterion is only computed after the burn-in period, when the samples are 
assumed to be drawn from the target distribution. The burn-in period has been fixed manually based on 
several a posteriori controls of simulated chains relative to different runs (here 1000 iterations). More 
sophisticated convergence monitoring techniques [31] should be used to stop the MCMC algorithm, but 
we chose the same criterion as for the VEM to carry out a more direct comparison. 

Considering the artificial dataset presented in Section IV-Al, Fig. 12 illustrates the evolution of ch 
and CA with respect to the computational time for both algorithms. Only about 18 seconds are enough 
to reach convergence for the VEM algorithm, while the MCMC alternative needs about 1 minute to 
converge on the same Intel Core 4-3.20 GHz - 4 Gb RAM architecture. The horizontal line in the blue 
curve relative to the MCMC algorithm corresponds to the burn-in period (1000 iterations). It can also 
be observed through these curves that NRL estimates converge faster than HRF estimates with the VEM 
approach, while the convergence speed seems to be the same using the MCMC algorithm. 

^ http ://numpy. scipy. org 
^http ://w ww.pyhrf.org 
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time (s) time (s) 

Fig. 12. Convergence curves in logarithmic scale of HRF (left) and NRL (right) estimates using MCMC and VEM. 



To illustrate the impact of the problem dimensions on the computational cost of both methods, Fig. 13 
shows the evolution of the computational time of one iteration when varying the number of voxel (left), 
the number of experimental conditions (middle) and the number of scans (right). The three curves show 
that the computational time increases almost linearly (see the blue and red curves) for both algorithms, but 
with different slopes. Blue curves (VEM) have steeper slopes than red ones (MCMC) in the three plots 
showing that the computational time of one iteration increases faster with VEM than with MCMC wrt 
the problem dimensions. As regards computational performance on the real fMRI data set presented in 



(a) (b) (c) 




number of voxels number of conditions number of scans 



Fig. 13. Evolution of the computational time per iteration using the MCMC and VEM algorithms when varying the problem 
dimension according to: (a): number of voxels; (b): number of conditions; (c): number of scans. 

Section IV-B and comprising 600 parcels, the VEM also appeared faster as it took 1 hour 30 to perform a 
whole brain analysis whereas the MCMC version took 12 hours. These analysis timings were obtained by 
a serial processing of all parcels for both approaches. When resorting to the distributed implementation, 
the analysis durations boiled down to 7 mins for VEM and 20 mins for MCMC (on a 128-cores cluster). 
To go further, we illustrate the computational time difference (^mcmc — ^vem) between both algorithms in 
terms of parcel size which ranged from 50 to 580 voxels. As VEM vs. MCMC efficiency appears to be 
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influenced by the level of activity within the parcel, we resorted to the same criterion as in Section IV-B 
to distinguish non-active from active parcels and tag the analysis durations accordingly in Fig. 14. 

Fig. 14[(a)] clearly shows that the differential timing between both algorithms is higher for non- 
activated parcels (blue dots) and increases with the parcel size, which confirms the utility of the proposed 
VEM approach especially in low CNR/SNR circumstances. To further investigate the gain in terms 
of computational time induced by using the VEM approach. Fig. 14[(b)] illustrates the gain factor 
(^mcmc/^vem) for activated and non-activated parcels. This figure shows that the VEM algorithm always 
performs better than the MCMC one since all obtained gain factors are greater than 1 (see horizontal 
line in Fig. 14[(b)]). Moreover, the gain factor is clearly higher for non-activated parcels for which the 
input SNRs and CNRs are relatively low, and it generally varies between 2.7 and 80. 
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Fig. 14. Comparison of durations for MCMC and VEM analyses in terms of parcel size, (a): differential timing tMCMC — tvEM- 
(b): gain factor of VEM compared to MCMC (tMCMc/tvEM), the horizontal hne indicates a gain factor of one (tMCMC = tvEM). 
Circles are red-colored for parcels estimated as activated, ie max{(/iim)i<m<M} > 8 and blue-colored otherwise. 



VI. Conclusion and future work 

In this paper, we have proposed a new method for parcel-based joint detection-estimation of brain 
activity from fMRI data. The proposed method rehes on a Variational EM algorithm as an alternative 
solution to intensive stochastic sampling used in previous work [16]. Compared to JDE MCMC, the 
proposed VEM approach does not require priors on the model parameters for inference to be carried 
out. However, for more robustness and to make the proposed approach completely auto-calibrated, the 
adopted model may be extended by injecting additional priors on some of its parameters as detailed in 
Appendix D for /3 and estimation. 

Illustrations on simulated and real datasets have been deeply conducted in order to evaluate the robustness 
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of the proposed method compared to its MCMC counterpart in different experimental contexts. Simula- 
tions have shown that the proposed VEM algorithm gave more precise estimation of activation labels and 
NRLs especially at low input CNR, while giving similar performance for HRF estimation. Simulations 
have also shown that our approach was more robust to stimulus density decrease (or equivalently 
ISI time increase). Similar conclusions have been drawn wrt noise level and autocorrelation structure. 
In addition, our VEM approach gave more robust estimation of the spatial regularisation parameter 
and more compact activation maps that are likely to better account for functional homogeneity. These 
good features of the VEM approach are provided in shorter computational time than with the MCMC 
implementation. Simulations have also been conducted to study the computational time variation wrt the 
problem dimensions, which may significantly vary from one experimental context to another. 
Regarding real data experiments, VEM and MCMC showed similar results with a higher specificity for the 
former. Although no ground truth is available in this context, these results further emphasize the interest 
of using VEM for a high gain in computational time. From a practical viewpoint, another advantage 
of the proposed algorithm lies in its simplicity. By contrast to the MCMC implementation of [15], the 
VEM algorithm only requires a simple stopping criterion. It is also more flexible to account for more 
complex situations such as those involving higher AR noise order, habituation modeling or considering 
three instead of two activation classes with an additional deactivation class. 

To confirm the impact of the proposed approach, comparisons between the MCMC and VEM approaches 
should also take place at the group level. In other words, we should compare the results of random effect 
analyses (RFX) based on a Student t-test on mean effects. A first RFX analysis would correspond to the 
classical approach, in which the input data are given by the normalized effects of a standard individual 
SPM analysis. Subsequent analyses would take the results of the VEM and MCMC approaches for each 
subject as inputs to RFX analyses. In the same vein, a seminal study has been performed in [7] where 
group results based on IDE MCMC intra-subject analyses provide higher sensitivity than results based 
on GLM based intra-subject analyses. Such a group-level validation would also shed the light on the 
impact of the used variational approximation in VEM. In fact, no preliminary spatial smoothing is used 
in the IDE approach by contrast to standard fMRI analyses where this smoothing helps retrieving clearer 
activation clusters. In this context, the used mean field approximation especially at the E-Q step should 
help getting less noisy activation clusters compared to the MCMC approach. Eventually, akin to [16], the 
model used in our approach accounts for functional homogeneity at the parcel scale. These parcels are 
assumed to be an input of the proposed IDE procedure and can be a priori provided independently by 
any parcellation technique [29, 32]. In the present work, parcels have been extracted based on functional 
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features extracted via a classical GLM processing supposing a canonical HRF for the entire brain. This 
assumption does not bias our HRF local model estimation since a large number of parcels is considered 
(600 parcels) with an average parcel size of 250 voxels. 

On real dataset, results may therefore depend on the reliability of the used parcellation technique. A 
sensitivity analysis has been performed in [33] on real data and for MCMC JDE version, that assesses 
the reliability of the used parcellation against a heavy approach where the parcellation was marginalised. 
Still, it would be of interest to investigate the effect of the parcellation choice in the VEM context, and 
more generally to extend the present framework to incorporate an automatic online parcellation strategy 
to better fit the fMRI data while accounting for the HRF variability across subjects, populations and 
experimental contexts. The current variational framework has the advantage to be easily augmented with 
parcel estimation as an additional layer in the hierarchical model. This will then raise the question of 
model selection, in particular the issue of well separating parcels at best Le. in a sparse manner so as 
to capture the spatial variability in hemodynamic territories while enabling the reproducibility of parcel 
identification across fMRI datasets. More generally, an approach to model selection can be easily carried 
out within the VEM implementation as variational approximations of standard information criteria based 
on penalised log-evidence can be efficiently used [34]. 

Appendix 

A. E-H step: 

For the E-H step, the expressions for and are: 

^'hI = [i/vt''R-' + E ( E v^:-'^^,xlr^''-'^x^, + slr^^'s,)]- (i3) 

<i = ^'■hI E rj"-^) (j/, - P£f-% (14) 

_ M 

with Sj = ^Jm^^-X^m, m^m^^ and v^^^^^^f denoting respectively the m and (m, m') entries of the mean vector (m^~^^) 

m=l ^ 3 3 

and covariance matrix (E^"^^) of the current 

B. E-A step: 

The E-A step also leads to a Gaussian pdf for p^^\ ~ Y\ -^(^a^ ^a^)- ^he parameters are updated as follows: 

S«= (^A.,+«,)-\ rni^] = Si^] A.^^r^^ + G^rf-)!^, - P^r^')) (15) 

where a number of intermediate quantities need to be specified. First, ^J^!f~^^ — • • • ^mIm ^ — ^p^^^^ [^] 

where G is the matrix G — [gi\ ... | qm] made of columns Qm — Xmh^- The m-th column of G is then also denoted by 
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= Xram^nl ^ ' Then, Aij = diag^ [pq-^^^)/^!^"^^] and Hj = E^(.) [G'T^'"^^ G] is an M x M matrix whose 
element {m,m') is given by: 

C. E-Q step: 

From p{A\Q) and piQ) in Section II, it follows that the {a^,q^) couples correspond to independent hidden Potts models 
with Gaussian class distributions. It follows an approximation that factorizes over conditions: Pq (Q) — Y\ PQrn(q^) where 

m=l 

PqI {q'^) = f(q'^\a'^ = m^^l ; fi^Z~^\v^Z~^\l3i^~^^) is the posterior of q"^ in a modified hidden Potts model /, in which the 
observations aj^'s are replaced by their mean values m^m and an external field {otj^^^'^ = t'^m^m . . . , l/'^/^^^j , 

j G 7^7} is added to the prior Potts model p{q'^\ /3m ~^^). It follows that the defined Potts reads: 

oc exp{ [ccf '\qT) + ^/SL""'' E ^(^r^ 9^)) }• (16) 

Since the expression in Eq. (16) is intractable, and using the mean-field approximation [26], PQL{q^) is approximated by a 

) such that if 



factorized density p^jl {q^) = 11 Po- (C) such that if = z. 



^3 3 

where is a particular configuration of q^ updated at each iteration according to a specific scheme and 

f{qT I ?^j\Pt-'\vt-'^) oc expiate-) + ^S:"'^ E ^(qT, qD}- 

1) M-{ijL^v) step: 
By maximizing with respect to (fJ^^v), Eq. (12) reads: 

(/x^^\t;^^)) = argmax E (.) (.) [logp(A | Q; m, ^)] (18) 
(fj,,v) Pa Pq ^ 



By denoting p-^ = Pq^ (0' ^^^r deriving wrt fUm and Vim for every z G {1 . . . /} and m G {1 . . . M}, we get 

2) M-Vh step: 
By maximizing with respect to Vh, Eq. (12) reads: 

vl^^ = argmax E^(^) [log p{h^;vh)]. (19) 



It follows the closed-form expression given by 



'''' ~ ~ (D-1) ~ (D-1) • 

For a more accurate estimation of Vh, one may take advantage of the flexibility of the VEM inference and inject some prior 
knowledge about this parameter in the model. Since this parameter is positive, a suitable prior can be an exponential distribution 
with mean A~^: 

p{vh] Xvh) = exjp{-Xv^Vh}. (21) 
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(r) 

Taking this prior into account, the new expression of the current estimate vf^^ is: 

3) M-/3 step: 

By maximizing with respect to /3, Eq. (12) reads: 

(3^"^^ = argmax E (.) [\ogp{Q; (3)] . (22) 

(3 Pq 

Updating /3 consists of making further use of a mean field-Hke approximation [26], which leads to a function that can be 
optimized using a gradient algorithm. To avoid over-estimation of this key parameter for the spatial regularisation, one can 
introduce for each /3m, some prior knowledge p^Pm', that penalises high values. As in Eq. (21), an exponential prior with 
mean A^^ can be used. The expression to optimize is then given by: 

= argmax^ (,) [logpiq"^; f3m)p{f3m; Xpm)] 

= argmax{-logZ(/3^) +/3^(^E^.)^ [5{qT,qT)\ - A/s^)}. (23) 

After calculating the derivative wrt /3m, we retrieve the standard equation detailed in [26] in which ^-(r-) l^iflT^^k")] is 

replaced by ^ E_(r) [^{qj^, qT)] ~ ^f3m- It t)e easily seen that, as expected, subtracting the constant helps penalizing 

j^k ^Q"^ 
high Prn values. 

4) M-{L,T) step: 

This maximization problem factorizes over voxels so that for each j eV^, we need to compute: 



(4"\r."0 = argmax E (.) [logp(2/,- | a,-, /i^; T,-)] , (24) 

(^.•,r,) Ph^Pa, 

where aj = {aj^,m = 1 . . . M}. Finding the maximizer wrt £j leads to (G is defined in the E-A step): 

= argmax{2(Gm^^^ - yjfV^p Pij + t^P'V^Pij}. (25) 

^3 ' 

After calculating the derivative wrt Ij, we get I'^p = [P^T^ p)-^ P^T^ (yj-Gm^.). In the AR(1) case, with Tj = cr^"^ A^, 
we can then derive the following relationship: 

4"-) = (P*AfP)-ip*Af (^yj - S,m« ) Fi(pf), (26) 
where Fi is a function linking the estimates and p^'^^ The above formula is similar to that in [16, p.965, B.2], when 

(r) (r) 

replacing by m}^^ and a by m\\ 

Denoting = yj — Pi^^ and considering the maximization wrt aj, similar calculations lead to: 



3 N 



I (e^.<., [ajAf «,] - 2m^GA<f^yr + yf^'Afyf )) = M4\i<f^ ill) 

where F2 is a function linking the estimates a^*^"^^ with and p'p . Matrix A^'^'* = E^(r) [G^A^G^ is a M x M matrix 
similar to the matrix Hj introduced in the E-A step. Its {m,m') entry is given by gly^Aj^'^gm' + trace(A^^^Xm5]^^X^/) . 

Eventually, the maximization wrt pj leads to p^-^^ = arg max{ (trace (C/iA^) + trace (C/2Aj)) I cP'p^ + log |Aj|}, with |Aj| = 

P3 

l—pj and where Aj has the same expression as A^ without the (r) superscript. Matrices Ui and U2 are respectively MxM and 
N X N matrices defined as Ui = Xl^j +m^jm^j and U2 = t/^^^ (y^-^^ — 2Gm^j)*. The derivative, denoted by Aj of Aj wrt 
Pj writes A^ = 2pjB-\-C, where the entries of B and C are zero except {B)n,n which is 1 forn = 2 : (A^— 1) and for (C)n,n+i 
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and (C)n+i,n which are -1 forn = 1 : (N — l). The derivative, denoted by A^, of Aj wrt pj can be written as: = 2pjB-\-C 
where B and C are M x M matrices whose entries (m, m') are respectively {B)m,m' = trsLce((X^5]^^^X^/ +gf^/gf^)S) 
and (C)^,^/ = tr ace [{Xm^^H^Xl^f + gm'9^m)C). Eventually, the derivative wrt pj leads to: 

= ^f^{2pf^(trace(L/iB) +trace(t/2B)) +trace(t7iC) +trace(t/2C)} = (pf \ erf ) . 

Then p^p can be estimated as a solution of the fixed point equation p^ = F^^p^ ,F2{pP .Fi^p^))). Note that in the 
Gaussian noise case, the updating of the noise parameters reduces to the estimation of erf which simplifies into erf = 
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